Anotación semiautomática con papeles temáticos de los corpus CESS-ECE

نویسندگان

Maria Antònia Martí

Mariona Taulé

Lluís Màrquez i Villodre

Manuel Bertrán

چکیده

In this paper we present the methodology followed in the automatic semantic annotation (argument structure and thematic roles of the verbal predicates) of the CESS-ECECAT/ESP corpus. Building from a verbal lexicon (1,482 entries) with information about the syntactic functions and their projection to arguments and thematic roles, we present a set of simple rules to automatically enrich syntactic trees with semantic information. This procedure permits to automatically annotate 60% of the expected arguments and thematic roles with a fairly low error rate (below 2%). Given the high quality of the obtained results, we claim that this methodology provides substantial savings in manual annotation effort and allows a semiautomatic approach to corpus annotation. Once completed, the CESS-ECE corpus will permit researchers to develop complete systems for automatic Semantic Role Labeling of Catalan and Spanish.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance?

The common use of a single de facto standard annotation scheme for dependency treebank creation leaves the question open to what extent the performance of an application trained on a treebank depends on this annotation scheme and whether a linguistically richer scheme would imply a decrease of the performance of the application. We investigate the effect of the variation of the number of gramma...

متن کامل

Internet como fuente de información léxica: extracción de etiquetas de dominio y detección de nuevos sentidos

Resumen Describimos un algoritmo que combina información léxica (extráıda de WordNet 1.6) con información en Internet (directorios de Altavista) para caracterizar automáticamente los sentidos de una palabra con etiquetas de dominio y, al mismo tiempo, detectar y describir nuevos sentidos relevantes en Internet. Esta información puede utilizarse, entre otras cosas, para enriquecer bases de datos...

متن کامل

Aspectos ortográficos, léxicos y morfosintácticos del etiquetado lingüístico de un corpus de informática en lengua gallega

Resumen. En este trabajo se examinan algunos aspectos del etiquetado lingüístico de un corpus técnico de informática en lengua gallega, en lo que respecta a cuestiones ortográficas, léxicas y morfosintácticas. En primer lugar, presentamos la características del corpus analizado y algunas de las aplicaciones de su procesamiento. A continuación, mostramos las técnicas empleadas en su anotación mo...

متن کامل

IARG-AnCora: Anotación de los corpus AnCora con argumentos implícitos

Iarg-AnCora aims to annotate the implicit arguments of deverbal nominalizations in AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling based on machine learning techniques. Semantic analyzers are essential components in the current applications of language technologies, in which it is important to obtain a deeper understanding of the text to make inferen...

متن کامل

Anotación semántica de los sustantivos del corpus SenSem

The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with synsets of WordNet. The final objective of research is the acquisition of semantic preferences.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Procesamiento del Lenguaje Natural

دوره 38 شماره

صفحات -

تاریخ انتشار 2007

Anotación semiautomática con papeles temáticos de los corpus CESS-ECE

نویسندگان

چکیده

منابع مشابه

How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance?

Internet como fuente de información léxica: extracción de etiquetas de dominio y detección de nuevos sentidos

Aspectos ortográficos, léxicos y morfosintácticos del etiquetado lingüístico de un corpus de informática en lengua gallega

IARG-AnCora: Anotación de los corpus AnCora con argumentos implícitos

Anotación semántica de los sustantivos del corpus SenSem

عنوان ژورنال:

اشتراک گذاری